open source software
ttta: Tools for Temporal Text Analysis
Lange, Kai-Robin, Benner, Niklas, Grönberg, Lars, Hachcham, Aymane, Kolli, Imene, Rieger, Jonas, Jentsch, Carsten
In its current state, the ttta package includes diachronic embeddings, dynamic topic modeling, and document scaling. These tools can be used to track changes in language use, identify emerging topics, and explore how the meaning of words and phrases has evolved over time. Our dynamic topic model approach is based on the model RollingLDA (Rieger et al., 2021), which is a modification of the classic Latent Dirichlet Allocation (Blei et al., 2003), that allows for the estimation of topics over time using a rolling window approach. We additionally implemented the model LDAPrototype (Rieger et al., 2020), serving as a more consistent foundation for RollingLDA than a common LDA. With these models, users can uncover and analyze topics of discussion in temporal data sets and track even rapid changes, which other dynamic topic models struggle with. This ability to track rapid changes in topics is further used in the Topical Changes model put forth by Rieger et al. (2022) and Lange et al. (2022) that identifies change points in the word topic distribution of RollingLDA. Figure 1 visualizes the changes observed by the Topical Changes model in speeches from the German Bundestag (Lange & Jentsch, 2023), which can be analyzed further using leave-one-out word impacts provided by the model or, as Lange et al. (2025) proposed, by asking Large Language Models to interpret the change and relate it to a possible narrative shift.
- Europe > Ukraine (0.07)
- Europe > Switzerland > Zürich > Zürich (0.05)
- Europe > Russia (0.05)
- (3 more...)
Meta now allows military agencies to access its AI software. It poses a moral dilemma for everybody who uses it
Meta will make its generative artificial intelligence (AI) models available to the United States' government, the tech giant has announced, in a controversial move that raises a moral dilemma for everyone who uses the software. Meta last week revealed it would make the models, known as Llama, available to government agencies, "including those that are working on defence and national security applications, and private sector partners supporting their work". The decision appears to contravene Meta's own policy which lists a range of prohibited uses for Llama, including "[m]ilitary, warfare, nuclear industries or applications" as well as espionage, terrorism, human trafficking and exploitation or harm to children. Meta's exception also reportedly applies to similar national security agencies in the United Kingdom, Canada, Australia and New Zealand. It came just three days after Reuters revealed China has reworked Llama for its own military purposes.
- North America > United States (1.00)
- Oceania > New Zealand (0.25)
- Oceania > Australia (0.25)
- (6 more...)
EBIC: an open source software for high-dimensional and big data biclustering analyses
Orzechowski, Patryk, Moore, Jason H.
Motivation: In this paper we present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding support for big data, making it possible to efficiently run large genomic data mining analyses. Additional enhancements include integration with R and Bioconductor and an option to remove influence of missing value on the final result. Results: EBIC was applied to datasets of different sizes, including a large DNA methylation dataset with 436,444 rows. For the largest dataset we observed over 6.6 fold speedup in computation time on a cluster of 8 GPUs compared to running the method on a single GPU. This proves high scalability of the algorithm. Availability: The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic . Installation and usage instructions are also available online.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.15)
- Europe > Poland > Lesser Poland Province > Kraków (0.05)
- Information Technology > Artificial Intelligence > Machine Learning (0.96)
- Information Technology > Software (0.91)
- Information Technology > Data Science > Data Mining > Big Data (0.62)
TelescopeML -- I. An End-to-End Python Package for Interpreting Telescope Datasets through Training Machine Learning Models, Generating Statistical Reports, and Visualizing Results
Ehsan, null, Gharib-Nezhad, null, Batalha, Natasha E., Valizadegan, Hamed, Martinho, Miguel J. S., Habibi, Mahdi, Nookula, Gopal
We are in a new era of space exploration, thanks to advancements in ground-and space-based telescopes, such as the James Webb Space Telescope [JWST2023PASP] and CRIRES. These remarkable instruments collect high-resolution, high-signal-to-noise spectra from extrasolar planets [Alderson2023Nature], and brown dwarfs [Miles2023ApJ] atmospheres. Without accurate interpretation of this data, the main objectives of space missions will not be fully accomplished. Different analytical and statistical methods, such as the chi-squared-test, Bayesian statistics, and radiative-transfer atmospheric modeling packages have been developed [batalha2019picaso, MacDonald2023] to interpret the spectra. They utilize either forwardand/or retrieval-radiative transfer modeling to analyze the spectra and extract physical information, such as atmospheric temperature, metallicity, carbon-to-oxygen ratio, and surface gravity [line2014systematic, Iyer2023Sphinx, Marley2015]. These atmospheric models rely on generating the physics and chemistry of these atmospheres for a wide range of thermal structures and compositions. In addition to Bayesian-based techniques, machine learning and deep learning methods have been developed in recent years for various astronomical problems, including confirming the classification of light curves for exoplanet validation [Valizadegan2022], recognizing molecular features [Zingales2018ExoGAN] as well as interpreting brown dwarfs spectra using Random Forest technique [Lueber2023RandomForesr_BDs].
- North America > United States > California > Riverside County > Riverside (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- North America > Mexico > Sonora (0.05)
- Europe > Germany (0.05)
Can Artificial Intelligence be Open Sourced?
At what was billed as a "fireside chat" at Tel Aviv University in June 2023, the very first question from the audience posed to OpenAI CEO Sam Altman and chief scientist Ilya Sutskever was, "Could open source LLMs (large language models) potentially match GPT-4's abilities without additional technical advances, or is there a'secret sauce' in GPT-4 unknown to the world that sets it apart from the other models?" After nervous laughter and applause, Sutskever said, "You don't want to think about it in binary black-and-white terms where there is a secret sauce that will never be rediscovered," adding that perhaps someday, an open source model would reproduce GPT-4--"but when it will be, there will be a much more powerful model in the companies, so there will always be a gap between the open source models and the private models, and this gap may even be increasing." In the ensuing months, despite Sutskever's caution that binary thinking about future AI development methods is too simplistic, there have been numerous opinions published that proclaim diametrically opposed opinions about whether or not open sourcing AI, particularly generative AI, is an imperative social necessity to counter corporate concentration, or opening an existentially threatening Pandora's box of anarchic instructions on how to make weapons or promulgate disinformation on massive scales. Examples of these seemingly incompatible opinions include "Make No Mistake – AI Is Owned by Big Tech," published in MIT Technology Review, and "Open-Source AI Is Uniquely Dangerous," published in IEEE Spectrum. The question regarding complex and nuanced reality around open source AI, especially in the context of large language models, however, is not whether or not it will emerge as a powerful force.
SelfEEG: A Python library for Self-Supervised Learning in Electroencephalography
Del Pup, Federico, Zanola, Andrea, Tshimanga, Louis Fabrice, Mazzon, Paolo Emilio, Atzori, Manfredo
SelfEEG is an open-source Python library developed to assist researchers in conducting Self-Supervised Learning (SSL) experiments on electroencephalography (EEG) data. Its primary objective is to offer a user-friendly but highly customizable environment, enabling users to efficiently design and execute self-supervised learning tasks on EEG data. SelfEEG covers all the stages of a typical SSL pipeline, ranging from data import to model design and training. It includes modules specifically designed to: split data at various granularity levels (e.g., session-, subject-, or dataset-based splits); effectively manage data stored with different configurations (e.g., file extensions, data types) during mini-batch construction; provide a wide range of standard deep learning models, data augmentations and SSL baseline methods applied to EEG data. Most of the functionalities offered by selfEEG can be executed both on GPUs and CPUs, expanding its usability beyond the self-supervised learning area. Additionally, these functionalities can be employed for the analysis of other biomedical signals often coupled with EEGs, such as electromyography or electrocardiography data. These features make selfEEG a versatile deep learning tool for biomedical applications and a useful resource in SSL, one of the currently most active fields of Artificial Intelligence.
PyVBMC: Efficient Bayesian inference in Python
Huggins, Bobby, Li, Chengkun, Tobaben, Marlon, Aarnos, Mikko J., Acerbi, Luigi
PyVBMC is a Python implementation of the Variational Bayesian Monte Carlo (VBMC) algorithm for posterior and model inference for black-box computational models (Acerbi, 2018, 2020). VBMC is an approximate inference method designed for efficient parameter estimation and model assessment when model evaluations are mildly-to-very expensive (e.g., a second or more) and/or noisy. Specifically, VBMC computes: - a flexible (non-Gaussian) approximate posterior distribution of the model parameters, from which statistics and posterior samples can be easily extracted; - an approximation of the model evidence or marginal likelihood, a metric used for Bayesian model selection. PyVBMC can be applied to any computational or statistical model with up to roughly 10-15 continuous parameters, with the only requirement that the user can provide a Python function that computes the target log likelihood of the model, or an approximation thereof (e.g., an estimate of the likelihood obtained via simulation or Monte Carlo methods). PyVBMC is particularly effective when the model takes more than about a second per evaluation, with dramatic speed-ups of 1-2 orders of magnitude when compared to traditional approximate inference methods. Extensive benchmarks on both artificial test problems and a large number of real models from the computational sciences, particularly computational and cognitive neuroscience, show that VBMC generally - and often vastly - outperforms alternative methods for sample-efficient Bayesian inference, and is applicable to both exact and simulator-based models (Acerbi, 2018, 2019, 2020). PyVBMC brings this state-of-the-art inference algorithm to Python, along with an easy-to-use Pythonic interface for running the algorithm and manipulating and visualizing its results.
Locking Down Secure Open Source Software
Panic rippled through the cybersecurity world in early December 2021 as word spread about a newly discovered vulnerability in a piece of open source software used by millions. A string of code called Log4J, which instructs programs written in Java to create a record of program activity, would allow attackers to insert malicious code into programs. The flaw led to risks in software used by government agencies, Web service providers such as Amazon Web Services and Apple iCloud, and even video games such as Minecraft. In fact, within days of the first announcement, attackers used the flaw to get into the computer of the Suffolk County, NY, clerk's office. Over the next few months, they stole files and passwords, installed malware and crypto-currency mining software, and gained access to other county networks, including the health and sheriff's departments.
- North America > United States > New York > Suffolk County (0.25)
- North America > United States > California (0.14)
- North America > United States > Massachusetts > Middlesex County > Lowell (0.05)
- Europe > Germany > Bavaria > Regensburg (0.05)
- Information Technology > Software (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Games (0.54)
Open Source Can Leverage Artificial Intelligence, Here Is How
Researchers and developers working on AI projects may find it easier to use open source software because it is typically less expensive to use than proprietary software. This may help to lower the price of creating AI solutions, which may boost the field's advancement. Over the past few years, the importance of open source software in the realm of AI has increased significantly. Open source software has several advantages, one of which is the possibility for programmers to work together and exchange information. AI developers can build on the work of others and share their own contributions by adopting open-source software, which promotes innovation and growth in the field of AI more quickly. Since a result, the subject may advance more quickly as programmers can collaborate and benefit from one another's contributions.
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence (1.00)
Lawsuit Takes Aim at the Way A.I. Is Built G.R. Jenkin & Associates
Continue reading the main story Lawsuit Takes Aim at the Way A.I. Is Built A programmer is suing Microsoft, GitHub and OpenAI over artificial intelligence technology that generates its own computer code. Send any friend a story As a subscriber, you have 10 gift articles to give each month. Anyone can read what you share. Give this articleGive this articleGive this article Video Tom Smith, a veteran programmer, shows how Codex can instantly generate computer code from a request in plain English.CreditCredit...Jason Henry for The New York Times Cade Metz, based in San Francisco, writes about artificial intelligence and other emerging technologies. ET In late June, Microsoft released a new kind of artificial intelligence technology that could generate its own computer code. Called Copilot, the tool was designed to speed the work of professional programmers.
- North America > United States > California > San Francisco County > San Francisco (0.25)
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- North America > United States > California > Alameda County > Berkeley (0.05)
- Asia > China > Henan Province > Zhengzhou (0.05)
- Information Technology > Communications > Social Media (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)